Das Ziel ist es, aus dem Datacamp Datensatz [Video Game Sales Date] (https://app.datacamp.com/workspace/datasets/dataset-python-video-games-sales), , folgende Fragestellung / Hypothese zu beantworten:

Fragestellung: “Ist es wahrscheinlicher, dass sich bestimmte Spielgenres in Europa signifikant (Unterschied von 50%) besser verkaufen lassen als im Japanischen und Nordamerikanischen Markt?”

Als Einführung werden wir auf Datacamp folgende Kurse durchgehen:

# Import libraries
library("plotly")
library("ggplot2")
library("plyr")
library("dplyr")

Daten einlesen und Dataframe erstellen

# Read csv from folder "data"
df = read.csv("./data/video_games_data.csv")

head(df, 10)

Data Wrangling

Bevor wir mit den Visualisierungen und Modelle beginnen können, müssen wir die Daten säubern. Das heisst es sollte keine Duplikate geben, fehlende Werte sollten korrekt eingetragen werden.

Es hat “N/A” Werte in den Spalten “Year” und “Publisher”. Diese Werte sollten korrekte “NA” Werte sein, damit sie bei den Visualisierungen und Berechnungen nicht berücksichtigt werden.

# Show rows with "N/A" values
df[grep("N/A", df$Publisher),]
df[grep("N/A", df$Year),]
# Replace "N/A" with "NA"
df[df == "N/A"] <- NA
# Check if values have been converted
df %>% 
  summarize(across(everything(), ~sum(is.na(.))))
df$Year
   [1] 2006 1985 2008 2009 1996 1989 2006 2006 2009 1984 2005 2005 1999 2007 2009 2010 2013 2004 1990 2005 2006 1989 1988 2013 2002 2002 2010 2005 2001 2011 1998
  [32] 2010 2013 2015 2012 2012 2009 2011 2001 2008 2010 2005 2011 2007 2014 2009 1996 2004 2007 2014 1992 2008 1997 2011 2010 2009 2008 1993 2004 2004 2011 2013
  [63] 2010 1996 2012 2012 1997 2013 2010 1999 2007 1994 2013 2012 2007 2011 1992 2015 2010 2004 2007 2012 2012 2009 1997 2007 2001 1999 2008 1982 2005 2010 2015
  [94] 2014 1998 1997 1988 2014 2008 2011 2006 2015 2009 2011 2003 1998 1998 2001 2014 2015 2003 2012 2013 2004 2002 2011 1996 2010 2008 2009 2007 2011 2005 2010
 [125] 2014 2010 2011 1986 2012 2001 2004 2002 2000 2009 2010 2002 1992 2004 1999 2010 2011 2006 2001 1992 2008 1998 2013 2006 2008 2009 2007 1998 1996 2006 2014
 [156] 1988 2009 1999 2007 2011 2009 2008 2001 2001 2001 1999 2008 2001 2014 2009 1989 1999 2012 2000 2003 1999 2009 2007 1997   NA 2004 2012 2003 2012 1994 2012
 [187] 2007 1995 2013 2008 2003 1992 2010 2014 1996 2006 2009 2014 2006 2010 1999 1996 1999 2014 1999 1998 2007 2007 2008 2007 2007 2005 2007 1997 2010 1990 2010
 [218] 2010 2007 2014 1999 2016 2011 2000 2003 2000 2008 2002 2011 1996 2007 1991 2002 2009 2013 2015 2014 2007 2004 1981 2006 2002 2000 2015 2014 2005 2012 2010
 [249] 2001 2006 2006 1987 2005 2006 1991 2005 2013 2014 1980 2008 2015 2013 2009 2011 2001 2004 2006 1995 2011 2005 2003 2016 2007 2011 2009 2011 1989 1984 1997
 [280] 2002 2014 1997 1995 2006 2011 1992 2007 2015 2007 2010 2007 2007 2005 1996 2000 2014 1997 2014 1984 2008 2007 2009 2005 2009 2014 1994 2003 2013 2014 2006
 [311] 2011 2005 1988 2010 2015 2004 2007 1995 1992 2009 2007 2011 2015 2011 2008 1996 2007 2013 2002 2007 2001 2015 1999 2000 2004 1998 2013 2007 2009 2002 2005
 [342] 1998 2002 2008 2013 2001 1999 2009 2007 2013 2005 2016 2004 1994 2001 2007 2001 2005 2000 1997 2006 2002 1997 2001 2005 2013 2010 2000 1999 2008 2009 2009
 [373] 1996 2010 1985 2009 2006   NA 2015 2010 2008 2014 2012 2013 2009 2002 2013 2011 1998 1994 2004 1990 2002 2000 2008 2015 2009 1998 2000 2008 1998 2013 2006
 [404] 2002 2002 2012 2005 2001 2003 2015 2014 2003 2008 2009 2015 1995 2015 2011 2015 2014 1999 1983 1995 2000 2007 2008 1995 2010 2003 2015 2010   NA 2008 2001
 [435] 2008 2001 2008 2005 2012 2014 2007 2000 1998 1999 2006 2007 1990 2010 2010 2014 2002 2001 1986 2012 2013 1994 2002 2012 2008 2012 2010 2008 1995 2009 2002
 [466] 1987 2002 2015 2012 2007   NA 2014 1993 2000 2002 2001 2015 2010 2014 2011 2003 2008 2002 2009 1996 2003 2005 2009 2001 2007 2006 2007 2002 2013 2014 2004
 [497] 2008 2003 2012 2001 2004 2011 1999 2007 1998 2013 2012 2002 2011 1990 1997 1996 2011 2015 2010 2007 2009 2010 2002 2002 2003 1994 2010 2007 2003 2008 2006
 [528] 1997 2011 1998 2001 1992 1996 2011 2001 2009 2011 2003 2014 2003 2009 2003 2007 2011 1980 1992 2013 2000 1986 2011 2006 2002 2001 2015 1998 2011 2006 2010
 [559] 2010 2010 2004 2014 2009 2011 1992 2005 2008 1998 2004 2006 2003 2012 2006 1986 1996 2007 2000 2006 2009 2009 2004 2007 2010 2009 2011 2008 2002 2010 1990
 [590] 2008 2007 2004 2009 2013 2009 2007 2006 2004 2010 2009 2007 2001 1987 2012 2003 2008 2008   NA 2001 2000 2007 1986 2013 2006 2008 1999 2011 2003 2010 2002
 [621] 2008 2008 1998 2012   NA 2013 2002 2010 2008 1992 1997 1996 2011 2006 2003 1997 2008 1998 1988 2010 2003 2001 1998 2014 2007 2015 1996 1987 1997   NA 2015
 [652] 2010   NA 1994 2013 2011 2005 2007 2011 1998 2015 2014 2010 2001 2007 2004 2004 2004 2009 2011 1995 2010 1996 1998 2008 2013 2007 2008 2001 2010 2015 2015
 [683] 2010 2000 1997 2004 2004 1999 1995 1998 2013 2012 2008 2011 1997 2013 2004 2009 2014 2015 1983 2008 2008 2012 1997 2012 2009 2006 2002 2001 2007   NA 2007
 [714] 2014 2002 1997 2004 2011 2009 2008 2001 2001 1990 2008 2009 2013 2003 2009 1998 2010 2010 2013 2006 2008 1981 2007 2000 1986 2008 2007 2011 2006 2003 1992
 [745] 2000 2012 2000 2002 2013 2013 2006 2003 2010 1998 1984 2001 2009 2000 2015 1987 2014 2007 1996 1983 1988 1992 1981 2004 2006 2003 2016 2010 2006 2010 2001
 [776] 1989 2011 2009 2000 2008 2002 2010   NA 2003 2011 1998 2002 2002 2005 2011 2005 1997 2006 2003 2010 2009 2006 2015 2010 2010 2002 2008 2007 2003 2010 1998
 [807] 1994 2010 2002 2010 2009 2011 2001 2007 2010 2010 2005 2004 1998 2003 2013 2007 2008 2013 2005 2002 2013 1992 2007 2009 2011 2008 2006 2005 2003 2011 2014
 [838] 1998 2011 2003 2008 2005 2014 2012 2007 2016 2013 2012 2014 1993 2008 1997 1989 2002 2002 1991 2013 2012 2014 2010 2002 2008 2003 2008 1981 2013 2004 1986
 [869] 1985 2006 2006 2014 2003 2005 1994 2003 2003 2008 1999 2001 1989 2004 1997 2002 2004 2013 2008 1993 1995 1998 2008 2014 2008 2013 2001 2009 2008 2001 2003
 [900] 2004 1996 2009 2000 2008 2008 2015 2011 2001 2012 2013 2008 2013 2009 1999 2004 1999 2005 1998 2007 2002 1998 2001 2013 2001 2005 2004 1999 1997 2004 2014
 [931] 2001 2009 1999 2010 2013 2004 2007 2004 1984 2011 2001 2009 2005 1998 2005 2012 1996 2015 1993 2007 1995 2014 1994 2001 2001 2007 2010 2014 2003 2000 2002
 [962] 2004 2008 2002 2006 2010 1999 2015 1994 2013 2012 2003 2006 2001 2004 2011 2010 2012 2013 2005 2007 2013 2013 1991 2000 2001 2009 1991 1998 2004 2001 1994
 [993] 1986 2004 2002 2002 2013 2012 2007 2015
 [ reached getOption("max.print") -- omitted 15598 entries ]

Erste Plots erstellen

# group by genre and summarize game sales to each region
df_genre <- df %>%
  group_by(Genre) %>%
  summarize(
    sum_NA = sum(NA_Sales),
    sum_EU = sum(EU_Sales), 
    sum_JP = sum(JP_Sales),
    sum_other = sum(Other_Sales),
    sum_global = sum(Global_Sales)
  )

fig <- plot_ly(
  df_genre, y = ~Genre, x = ~sum_NA, type = "bar", name = "NA", width = 1000, height = 800) %>% 
  add_trace(x = ~sum_EU, name = "EU") %>%
  add_trace(x = ~sum_JP, name = "Japan") %>%
  add_trace(x = ~sum_other, name = "Other") %>%
  layout(
    title = "Video Game Sales by Genre",
    xaxis = list(title = "Sales (million)"),
    barmode = "group"
  )

fig
df_year_genre <- df %>%
  group_by(Year, Genre) %>%
  summarize(
    sum_NA = sum(NA_Sales),
    sum_EU = sum(EU_Sales), 
    sum_JP = sum(JP_Sales),
    sum_other = sum(Other_Sales),
    sum_global = sum(Global_Sales)
  )

df_year_genre
LS0tDQp0aXRsZTogIlJlZ3Jlc3Npb24gbW9kZWxzIHdpdGggUiINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCg0KDQpEYXMgWmllbCBpc3QgZXMsIGF1cyBkZW0gRGF0YWNhbXAgRGF0ZW5zYXR6IFtWaWRlbyBHYW1lIFNhbGVzIERhdGVdIChodHRwczovL2FwcC5kYXRhY2FtcC5jb20vd29ya3NwYWNlL2RhdGFzZXRzL2RhdGFzZXQtcHl0aG9uLXZpZGVvLWdhbWVzLXNhbGVzKSwgLCBmb2xnZW5kZSBGcmFnZXN0ZWxsdW5nIC8gSHlwb3RoZXNlIHp1IGJlYW50d29ydGVuOg0KDQoNCiMjIyBGcmFnZXN0ZWxsdW5nOiAiSXN0IGVzIHdhaHJzY2hlaW5saWNoZXIsIGRhc3Mgc2ljaCBiZXN0aW1tdGUgU3BpZWxnZW5yZXMgaW4gRXVyb3BhIHNpZ25pZmlrYW50IChVbnRlcnNjaGllZCB2b24gNTAlKSBiZXNzZXIgdmVya2F1ZmVuIGxhc3NlbiBhbHMgaW0gSmFwYW5pc2NoZW4gdW5kIE5vcmRhbWVyaWthbmlzY2hlbiBNYXJrdD8iDQoNCg0KQWxzIEVpbmbDvGhydW5nIHdlcmRlbiB3aXIgYXVmIERhdGFjYW1wIGZvbGdlbmRlIEt1cnNlIGR1cmNoZ2VoZW46DQoNCi0gW0ludHJvZHVjdGlvbiB0byBSZWdyZXNzaW9uIGluIFJdKGh0dHBzOi8vYXBwLmRhdGFjYW1wLmNvbS9sZWFybi9jb3Vyc2VzL2ludHJvZHVjdGlvbi10by1yZWdyZXNzaW9uLWluLXIpDQoNCi0gW0ludGVybWVkaWF0ZSBSZWdyZXNzaW9uIGluIFJdKGh0dHBzOi8vYXBwLmRhdGFjYW1wLmNvbS9sZWFybi9jb3Vyc2VzL2ludGVybWVkaWF0ZS1yZWdyZXNzaW9uLWluLXIpDQoNCg0KYGBge3J9DQojIEltcG9ydCBsaWJyYXJpZXMNCmxpYnJhcnkoInBsb3RseSIpDQpsaWJyYXJ5KCJnZ3Bsb3QyIikNCmxpYnJhcnkoInBseXIiKQ0KbGlicmFyeSgiZHBseXIiKQ0KYGBgDQoNCiMjIyBEYXRlbiBlaW5sZXNlbiB1bmQgRGF0YWZyYW1lIGVyc3RlbGxlbg0KDQpgYGB7cn0NCiMgUmVhZCBjc3YgZnJvbSBmb2xkZXIgImRhdGEiDQpkZiA9IHJlYWQuY3N2KCIuL2RhdGEvdmlkZW9fZ2FtZXNfZGF0YS5jc3YiKQ0KDQpoZWFkKGRmLCAxMCkNCmBgYA0KIyMjIERhdGEgV3JhbmdsaW5nDQpCZXZvciB3aXIgbWl0IGRlbiBWaXN1YWxpc2llcnVuZ2VuIHVuZCBNb2RlbGxlIGJlZ2lubmVuIGvDtm5uZW4sIG3DvHNzZW4gd2lyIGRpZSBEYXRlbiBzw6R1YmVybi4gRGFzIGhlaXNzdCBlcyBzb2xsdGUga2VpbmUgRHVwbGlrYXRlIGdlYmVuLCBmZWhsZW5kZSBXZXJ0ZSBzb2xsdGVuIGtvcnJla3QgZWluZ2V0cmFnZW4gd2VyZGVuLg0KDQpFcyBoYXQgIk4vQSIgV2VydGUgaW4gZGVuIFNwYWx0ZW4gIlllYXIiIHVuZCAiUHVibGlzaGVyIi4gRGllc2UgV2VydGUgc29sbHRlbiBrb3JyZWt0ZSAiTkEiIFdlcnRlIHNlaW4sIGRhbWl0IHNpZSBiZWkgZGVuIFZpc3VhbGlzaWVydW5nZW4gdW5kIEJlcmVjaG51bmdlbiBuaWNodCBiZXLDvGNrc2ljaHRpZ3Qgd2VyZGVuLg0KDQpgYGB7cn0NCiMgU2hvdyByb3dzIHdpdGggIk4vQSIgdmFsdWVzDQpkZltncmVwKCJOL0EiLCBkZiRQdWJsaXNoZXIpLF0NCmRmW2dyZXAoIk4vQSIsIGRmJFllYXIpLF0NCmBgYA0KDQpgYGB7cn0NCiMgUmVwbGFjZSAiTi9BIiB3aXRoICJOQSINCmRmW2RmID09ICJOL0EiXSA8LSBOQQ0KYGBgDQoNCmBgYHtyfQ0KIyBDaGVjayBpZiB2YWx1ZXMgaGF2ZSBiZWVuIGNvbnZlcnRlZA0KZGYgJT4lIA0KICBzdW1tYXJpemUoYWNyb3NzKGV2ZXJ5dGhpbmcoKSwgfnN1bShpcy5uYSguKSkpKQ0KYGBgDQoNCmBgYHtyfQ0KIyBTZXQgZGF0YSB0byBjb3JyZWN0IHR5cGUNCmRmJEdlbnJlIDwtIGFzLmZhY3RvcihkZiRHZW5yZSkNCmRmJFllYXIgPC0gYXMubnVtZXJpYyhkZiRZZWFyKQ0KYGBgDQoNCiMjIyBFcnN0ZSBQbG90cyBlcnN0ZWxsZW4NCg0KYGBge3J9DQojIGdyb3VwIGJ5IGdlbnJlIGFuZCBzdW1tYXJpemUgZ2FtZSBzYWxlcyB0byBlYWNoIHJlZ2lvbg0KZGZfZ2VucmUgPC0gZGYgJT4lDQogIGdyb3VwX2J5KEdlbnJlKSAlPiUNCiAgc3VtbWFyaXplKA0KICAgIHN1bV9OQSA9IHN1bShOQV9TYWxlcyksDQogICAgc3VtX0VVID0gc3VtKEVVX1NhbGVzKSwgDQogICAgc3VtX0pQID0gc3VtKEpQX1NhbGVzKSwNCiAgICBzdW1fb3RoZXIgPSBzdW0oT3RoZXJfU2FsZXMpLA0KICAgIHN1bV9nbG9iYWwgPSBzdW0oR2xvYmFsX1NhbGVzKQ0KICApDQoNCmZpZyA8LSBwbG90X2x5KA0KICBkZl9nZW5yZSwgeSA9IH5HZW5yZSwgeCA9IH5zdW1fTkEsIHR5cGUgPSAiYmFyIiwgbmFtZSA9ICJOb3J0aCBBbWVyaWNhIiwgd2lkdGggPSAxMDAwLCBoZWlnaHQgPSA4MDApICU+JSANCiAgYWRkX3RyYWNlKHggPSB+c3VtX0VVLCBuYW1lID0gIkV1cm9wZSIpICU+JQ0KICBhZGRfdHJhY2UoeCA9IH5zdW1fSlAsIG5hbWUgPSAiSmFwYW4iKSAlPiUNCiAgYWRkX3RyYWNlKHggPSB+c3VtX290aGVyLCBuYW1lID0gIk90aGVyIikgJT4lDQogIGxheW91dCgNCiAgICB0aXRsZSA9ICJWaWRlbyBHYW1lIFNhbGVzIGJ5IEdlbnJlIiwNCiAgICB4YXhpcyA9IGxpc3QodGl0bGUgPSAiU2FsZXMgKG1pbGxpb24pIiksDQogICAgYmFybW9kZSA9ICJncm91cCINCiAgKQ0KDQpmaWcNCmBgYA0KDQpgYGB7cn0NCmRmX3llYXJfZ2VucmUgPC0gZGYgJT4lDQogIGdyb3VwX2J5KFllYXIsIEdlbnJlKSAlPiUNCiAgc3VtbWFyaXplKA0KICAgIHN1bV9OQSA9IHN1bShOQV9TYWxlcyksDQogICAgc3VtX0VVID0gc3VtKEVVX1NhbGVzKSwgDQogICAgc3VtX0pQID0gc3VtKEpQX1NhbGVzKSwNCiAgICBzdW1fb3RoZXIgPSBzdW0oT3RoZXJfU2FsZXMpLA0KICAgIHN1bV9nbG9iYWwgPSBzdW0oR2xvYmFsX1NhbGVzKQ0KICApDQoNCmRmX3llYXJfZ2VucmUNCmBgYA0KYGBge3J9DQoNCmBgYA0KDQo=